128 research outputs found

    Transcript identification from deep sequencing data

    Get PDF
    Ribonucleic acid (RNA) sequences are polymeric molecules ubiquitous in every living cell. RNA molecules mediate the flow of information from the DNA sequence to most functional elements in the cell. Therefore, it is of great interest in biological and biomedical research to associate RNA molecules to a biological function and to understand mechanisms of their regulation. The goal of this study is the characterization of the RNA sequence composi- tion of biological samples (transcriptome) to facilitate the understanding of RNA function and regulation. Traditionally, a similar task has been addressed by algorithms called gene finding systems, predicting RNA sequences (transcripts) from features of the genomic DNA sequence. Lacking sufficient experimental evidence for most of the genes, these systems learn sequence patterns on a few genes with direct evidence to identify many additional genes in the genome. High-throughput sequencing of RNA (RNA-Seq) has recently become a powerful tech- nology in studying the transcriptome. This technology identifies millions of short RNA fragments (reads of ≈100 letters length), holding direct evidence for a large fraction of the genes. However, the analysis of RNA-Seq data faces profound challenges. Firstly, the distribution of RNA-Seq reads is highly uneven among genes, resulting in a considerable fraction of genes with very few reads and the stochastic nature of the technology leads to gaps even for well covered genes. To accurately predict transcripts in cases with incomplete evidence, we need to combine RNA-Seq evidence with features derived from the genomic DNA sequence. We therefore developed a method to learn the integration of both information sources and implemented this strategy as an extension of the gene finder mGene. The system, now called mGene.ngs, determines close approximations of potentially non-linear transformations for all features on the training set, such that the prediction performance is maximized. With this ability, which is to our knowledge unique among gene finding systems, mGene.ngs can not only learn complex relationships between the two mentioned information sources, but gains the flexibility to take many additional information sources into account. mGene.ngs has been independently evaluated within the context of an international competition (RGASP) for RNA-Seq-based reannotation and has shown very favourable performance for two out of three model organisms. Moreover, we generated and analyzed RNA-Seq-based annotations for 20 Arabidopsis thaliana strains, to facilitate a deeper understanding of phenotypic variation in this natural plant population. A second major challenge in transcriptome reconstruction lies in the complexity of the transcriptome itself. A process called alternative splicing generates multiple mature RNA sequences from a single primary RNA sequence by cutting out so-called introns, typically in a tightly regulated manner. Inference algorithms of almost all gene finding systems are limited to predict transcripts not overlapping in their genomic region of origin. To overcome this limitation, purely RNA-Seq-based approaches have been developed. However, biologically implausible assumptions or the neglect of available information often led to unsatisfactory results. A major contribution of this study is the integer optimization-based transcriptome reconstruction approach MiTie. MiTie utilizes a biologically motivated loss function, can take advantage of a priori known genome annotations and gains predictive power by considering multiple RNA-Seq samples simultaneously. Based on simulated data for the human genome as well as on an extensive RNA-Seq data set for the model organism Drosophila melanogaster we show that MiTie predicts transcripts significantly more accurate than state-of-the-art methods like Cufflinks and Trinity

    Transcript quantification with RNA-Seq data

    Get PDF
    Motivation Novel high-throughput sequencing technologies open exciting new approaches to transcriptome profiling. Sequencing transcript populations of interest, e.g. from different tissues or variable stress conditions, with RNA sequencing (RNA-Seq) [1] generates millions of short reads. Accurately aligned to a reference genome, they provide digital counts and thus facilitate transcript quantification. As the observed read counts only provide the summation of all expressed sequences at one locus, the inference of the underlying transcript abundances is crucial for further quantitative analyses. Methods To approach this problem, we have developed a new technique, called rQuant, based on quadratic programming. Given a gene annotation and position-wise exon/intron read coverage from read alignments, we determine the abundances for each annotated transcript by minimising a suitable loss function. It penalises the deviation of the observed from the expected read coverage given the transcript weights. The observed read coverage is typically non-uniformly distributed over the transcript due to several biases in the generation of the sequencing libraries and the sequencing. This leads to distortions of the transcript abundances, if not corrected properly. We therefore extended our approach to jointly optimise transcript profiles, modeling the coverage deviations depending on the position in the transcript. Our method can be applied without knowledge of the underlying transcript abundances and equally benefits from loci with and without alternative transcripts. Results To quantitatively evaluate the quality of our abundance predictions, we used a set of simulated reads from transcripts with known expression as a benchmark set. It was generated using the Flux Simulator [2] modeling biases in RNA-Seq as well as preparation experiments. Table 1 shows preliminary results with segment- and position-based loss as well as with and without the transcript profiles. Our results indicate that the position-based modeling together with transcript profiles allows us to accurately infer the underlying expression of single transcripts as well as of multiple isoforms of one gene locus

    Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data : From Seventh International Society for Computational Biology (ISCB) Student Council Symposium 2011 Vienna, Austria. 15 July 2011

    Get PDF
    First published by BioMed Central: Schultheiss, Sebastian J.; Jean, Géraldine; Behr, Jonas; Bohnert, Regina; Drewe, Philipp; Görnitz, Nico; Kahles, André; Mudrakarta, Pramod; Sreedharan, Vipin T.; Zeller, Georg; Rätsch, Gunnar: Oqtans: a Galaxy-integrated workflow for quantitative transcriptome analysis from NGS Data - In: BMC Bioinformatics. - ISSN 1471-2105 (online). - 12 (2011), suppl. 11, art. A7. - doi:10.1186/1471-2105-12-S11-A7

    TERRITÓRIO REPRODUTIVO DO COLEIRO-DO-BREJO (SPOROPHILA COLLARIS) NO SUL DO BRASIL: SELEÇÃO E DESCRIÇÃO DE ÁREAS DE NIDIFICAÇÃO E ALIMENTAÇÃO

    Get PDF
    Resumo ∙ Atualmente informações sobre as variáveis que influenciam a seleção dos territórios reprodutivos das populações brasileiras do Coleiro-do-brejo (Sporophila collaris) são escassas assim como a descrição de áreas para nidificação. Durante a estação reprodutiva de 2015– 2016 coletamos dados de reprodução de S. collaris nos campos do sul do Brasil com objetivo de estimar o tamanho dos territórios, comparando dois métodos de avaliação, e caracterizar a estrutura e composição florística dos micro-habitats importantes na seleção dos territórios reprodutivos. Adicionalmente identificamos as principais espécies de plantas que são usadas como alimento nos territórios reprodutivos. Monitoramos 32 casais, os quais estabeleceram seus territórios em ambientes seminaturais úmidos (56%), ambientes seminaturais secos (25%) e banhados (19%). Os territórios sempre estiveram associados a um corpo d’água, sendo que 59% estavam próximos a canais artificiais de irrigação, 38% a açudes e 3% à lavoura de arroz. O tamanho médio estimado dos territórios reprodutivos foi de 1,46 ha (Mínimo Polígono Convexo) e 3,22 ha (Estimativa de Densidade de Kernel). Registramos 66 espécies de plantas nas amostras estudadas, sendo 15 exclusivas nas amostras ninho e sete nas amostras não-ninho. A cobertura do estrato médio e alto, bem como a altura da vegetação e a presença de água, foram características importantes na seleção dos territórios reprodutivos. Das 22 espécies de plantas que compõem a dieta, 70% foram representadas por Poaceae. Nós enfatizamos a necessidade de mais estudos com territórios reprodutivos para espécies no Brasil e recomendamos cautela ao avaliar resultados de medidas de tamanho de território baseadas em diferentes métodos. Abstract ∙ Breeding territory of the Rusty-collared Seedeater (Sporophila collaris) in southern Brazil: selection and description of nesting and feeding areas Currently, the variables that influence the selection of breeding territories of Brazilian populations of Rusty-collared Seedeater (Sporophila collaris), as well as the description of nesting areas, are poorly known. During the breeding season of 2015–2016 we collected data on breeding of S. collaris in grasslands in southern Brazil aiming to estimate the size of the territories, comparing two methods of evaluation, and describe the structure and floristic composition of the microhabitats used for selection of breeding territories. Additionally, we identified the main species of plants that were used as food sources within the breeding territory. We monitored 32 pairs of S. collaris, with territories established in humid seminatural environments (56%), in dry seminatural environments (25%) and in wetlands (19%). Breeding territories were always associated with a water body, with 59% of them being close to artificial irrigation channels, 38% to dams and 3% to rice crops. The estimated average size of breeding territory was 1.46 ha (Minimum Convex Polygon) and 3.22 ha (Kernel Density Estimation). We recorded 66 plant species in the study site samples, 15 being exclusive of nest samples and seven in non-nest samples. Middle and upper vertical cover, as well vegetation height and presence of water, were the most important characteristics associated with the selection of breeding territories. From the 22 species of plants that composed the diet, Poaceae represented 70%. We stressed the importance of breeding territories studies for species in Brazil and we recommend cautious when evaluating results of territory size estimates based on different methods

    Network-based integration of multi-omics data for prioritizing cancer genes

    Get PDF
    Several molecular events are known to be cancer-related, including genomic aberrations, hypermethylation of gene promoter regions, and differential expression of microRNAs. These aberration events are very heterogeneous across tumors and it is poorly understood how they affect the molecular makeup of the cell, including the transcriptome and proteome. Protein interaction networks can help decode the functional relationship between aberration events and changes in gene and protein expression.; We developed NetICS (Network-based Integration of Multi-omics Data), a new graph diffusion-based method for prioritizing cancer genes by integrating diverse molecular data types on a directed functional interaction network. NetICS prioritizes genes by their mediator effect, defined as the proximity of the gene to upstream aberration events and to downstream differentially expressed genes and proteins in an interaction network. Genes are prioritized for individual samples separately and integrated using a robust rank aggregation technique. NetICS provides a comprehensive computational framework that can aid in explaining the heterogeneity of aberration events by their functional convergence to common differentially expressed genes and proteins. We demonstrate NetICS' competitive performance in predicting known cancer genes and in generating robust gene lists using TCGA data from five cancer types.; NetICS is available at https://github.com/cbg-ethz/netics.; [email protected].; Supplementary data are available at Bioinformatics online

    Diagnóstico sobre a avifauna apreendida e entregue espontaneamente na Região Central do Rio Grande do Sul, Brasil

    Get PDF
    The traffic of wildlife animals is an old practice and is defined as the removal of free animals to the trade market. In the present paper, we present data on the quantitative and qualitative surveys of caught and spontaneously handed in wild birds to environmental authorities in Santa Maria. In order to do so, we analyzed the protocols of the apprehension by the Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis (IBAMA) and by the Segunda Companhia Ambiental da Brigada Militar, which perform in the central region of the state of Rio Grande do Sul, from 2003 to 2005. As the result of the analysis from the birds caught not only by IBAMA but also by the Segunda Companhia Ambiental da Brigada Militar, it adds up to 1,120 specimens of caught birds and 60 handed in spontaneously to the supervising organs. The specimen mostly caught was Monk Parakeet, Myiopsitta monachus, followed by Paroaria coronata, Nothura maculosa, Cyanoloxia brissonii e Carduellis magellanica which together accounted 57% of the total apprehensions. We found that the agency that has seized the most of the avifauna was the IBAMA and the awareness of the population is still tentative because of the spontaneous deliveries performed by very few people on seizures.O tráfico de animais silvestres é uma prática antiga, sendo definido pela retirada de espécimes de vida livre para que possam ser comercializados. No presente trabalho, são apresentados os dados referentes ao levantamento qualitativo e quantitativo das espécies da avifauna apreendidas e entregues espontaneamente as autoridades ambientais em Santa Maria (RS). Para tanto, foram analisados os protocolos de apreensão de aves silvestres registrados pelo Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis (IBAMA) e pela 2ª Companhia Ambiental da Brigada Militar que atuam na Região Central do Rio Grande do Sul, nos anos de 2003, 2004 e 2005. Como resultado, foram contabilizados 1120 espécimes de aves apreendidos e 60 entregues espontaneamente aos órgãos fiscalizadores do IBAMA e da 2ª Companhia Ambiental da Brigada Militar. As espécies mais prejudicadas pelo tráfico foram a Myiopsitta monachus, seguida pela Paroaria coronata, Nothura maculosa, Cyanoloxia brissonii e Carduelis magellanica que, juntas, representaram 57% do total de apreensões. Verificou-se que o órgão que mais tem apreendido a avifauna é o IBAMA e que a conscientização da população ainda é pequena visto que as entregas espontâneas realizadas pela população são muito poucas em relação às apreensões.

    The Possible "Proton Sponge " Effect of Polyethylenimine (PEI) Does Not Include Change in Lysosomal pH.

    Get PDF
    Polycations such as polyethylenimine (PEI) are used in many novel nonviral vector designs and there are continuous efforts to increase our mechanistic understanding of their interactions with cells. Even so, the mechanism of polyplex escape from the endosomal/lysosomal pathway after internalization is still elusive. The “proton sponge ” hypothesis remains the most generally accepted mechanism, although it is heavily debated. This hypothesis is associated with the large buffering capacity of PEI and other polycations, which has been interpreted to cause an increase in lysosomal pH even though no conclusive proof has been provided. In the present study, we have used a nanoparticle pH sensor that was developed for pH measurements in the endosomal/lysosomal pathway. We have carried out quantitative measurements of lysosomal pH as a function of PEI content and correlate the results to the “proton sponge ” hypothesis. Our measurements show that PEI does not induce change in lysosomal pH as previously suggested and quantification of PEI concentrations in lysosomes makes it uncertain that the “proton sponge ” effect is the dominant mechanism of polyplex escape
    corecore